Technotools

home *** CD-ROM | disk | FTP | other *** search

/ Technotools / Technotools (Chestnut CD-ROM)(1993).ISO / lang_oth / libry31a / libry7.doc < prev next >

Wrap

Text File | 1987-01-20 | 8KB | 132 lines

.pa VECTOR EMULATIONS Vector emulations are software procedures that mimic the operation of vector processing hardware. Of course, the software is not based on the same principle as the hardware; but the concept is the same: specific procedure designed to most efficiently perform similar repetitive tasks on contiguously stored real numbers. No, I won't tell you how I do it, so don't ask. My vector emulations are completely compatible with Hewlett-Packard's Vector Instruction Set (VIS). They have the same calling syntax and function (that's why I developed them in the first place - downloading programs from an HP-1000F). HP has a very nice manual with examples. If you are interested, perhaps they would sell you one (I wouldn't even hazard a guess as to the cost). Vector Instruction Set (VIS) User's Manual Part No. 12824-90001 Hewlett-Packard Company Data Systems Division 11000 Wolfe Road Cupertino, CA 95014 You do not need a math coprocessor (Intel 8087/80287) in order to run a program linked with LIBRY.LIB; but it makes a TREMENDOUS difference (a factor of 120 or so for floating point operations). The vector emulations will run even without a math coprocessor; but in that case the speed is already so slow that nothing will help. The improvement in speed with the vector emulations varies depending on the relative speed of your processor and coprocessor. The greatest improvement is realized on a PC with a 5MHz-8086/5MHz-8087 pair; and the least improvement is realized on an AT with an 8MHz-80286/5MHz-80287 pair. Note that the increments (INCR1,INCR2,INCR3),index (M), and the count (N) are of the type INTEGER*2. Reals are of the type REAL*4 and double precision reals are of the type REAL*8. There can be no mixing of REAL*4 and REAL*8 types in the same emulation. To get double precision use "CALL DVABS(...)" rather than "CALL VABS(...)". It is very important to BE SURE THAT NO VECTOR CROSSES A SEGMENT BOUNDARY (refer to Microsoft FORTRAN manual section 8). What this means to the machine is that a vector must reside within a single segment (65536 bytes) or it can not address all of the elements as a group. In order to assure this to be the case, NEVER use the $LARGE metacommand. If you have no COMMON then you never have to worry about this. If you do have COMMON make sure that each COMMON contain no more than 65536 bytes. Of course, you can have several named COMMONs so this is not too restrictive a limit on your programs. Also, if there is more than one vector passed to the emulator they need not reside in the same segment. For instance, you can add one real vector with 16384 elements to another with 16384 elements and store the result in a third - as long as they are all in different COMMONs. Of course, you can add two vectors in the same COMMON provided their total number of elements does not exceed 16384. There is a way of getting around this; but it is too involved to explain here. A word of warning... vector emulations do not like being interrupted. This is the whole point of "speed at any cost" procedures. For this reason, the emulations may interfere with the operation of some "pop-up" programs and such things as windowing and multi-tasking. This is regretably unpredictable. I can say that the emulations don't interfere with any of the "pop-up" programs that I have developed (like my DOS command stack full-screen editor and improved scroller) that "lurk" in the background; but I don't know about such programs that others have developed. .pa SAMPLE FORTRAN EQUIVALENT OF A VECTOR ADD SUBROUTINE VADD(V1,INCR1,V2,INCR2,V3,INCR3,N) C C VECTOR V3=V1+V2 C IMPLICIT INTEGER*2 (I-N) IMPLICIT REAL*4 (A-H,O-Z) DIMENSION V1(N),V2(N),V3(N) C IF(N.LT.1) GO TO 999 I1=1 I2=1 I3=1 C DO 100 I=1,N V3(I3)=V1(I1)+V2(I2) I1=I1+INCR1 I2=I2+INCR2 100 I3=I3+INCR3 C 999 RETURN END .pa .ft c .in 15 SUMMARY OF VECTOR INSTRUCTION SET ---------------------------------------------------------------------------------------------------------------------- SPEED IMPROVEMENT: PC WITH 8087 HP-1000F CALLING SYNTAX OPERATION VECTOR LENGTH: N=10 N=100 N=10 N=100 ---------------------------------------------------------------------------------------------------------------------- CALL VABS(V1,INCR1,V2,INCR2,N) (V2(I)=ABS(V1(I)),I=1,N) 4.0 7.5 4.5 5.1 CALL VADD(V1,INCR1,V2,INCR2,V3,INCR3,N) (V3(I)=V1(I)+V2(I),I=1,N) 3.3 3.8 4.9 4.8 CALL VDIV(V1,INCR1,V2,INCR2,V3,INCR3,N) (V3(I)=V1(I)/V2(I),I=1,N) 2.5 3.2 5.0 5.7 CALL VDOT(S,V1,INCR1,V2,INCR2,N) S=SUM(V1(I)*V2(I),I=1,N) 4.0 4.8 3.5 3.6 CALL VMAB(M,V1,INCR1,N) V1(M)=AMAX1(ABS(V1(I)),I=1,N) 3.5 4.4 3.6 3.6 CALL VMAX(M,V1,INCR1,N) V1(M)=AMAX1(V1(I),I=1,N) 3.5 3.3 4.2 4.4 CALL VMIB(M,V1,INCR1,N) V1(M)=AMIN1(ABS(V1(I)),I=1,N) 3.8 4.8 3.7 3.2 CALL VMIN(M,V1,INCR1,N) V1(M)=AMIN1(V1(I),I=1,N) 3.5 3.5 4.2 3.9 CALL VMOV(V1,INCR1,V2,INCR2,N) (V2(I)=V1(I),I=1,N) 3.3 9.0 5.2 6.5 CALL VMPY(V1,INCR1,V2,INCR2,V3,INCR3,N) (V3(I)=V1(I)*V2(I),I=1,N) 3.5 3.8 4.8 4.7 CALL VNRM(S,V1,INCR1,N) S=SUM(ABS(V1(I)),I=1,N) 5.3 4.7 3.8 4.5 CALL VPIV(S,V1,INCR1,V2,INCR2,V3,INCR3,N) (V3(I)=S*V1(I)+V2(I),I=1,N) 3.4 3.5 4.6 5.2 CALL VSAD(S,V1,INCR1,V2,INCR2,N) (V2(I)=S+V1(I),I=1,N) 3.4 4.0 3.7 4.2 CALL VSDV(S,V1,INCR1,V2,INCR2,N) (V2(I)=S/V1(I),I=1,N) 3.0 3.2 4.5 4.4 CALL VSMY(S,V1,INCR1,V2,INCR2,N) (V2(I)=S*V1(I),I=1,N) 3.4 4.0 4.0 4.5 CALL VSSB(S,V1,INCR1,V2,INCR2,N) (V2(I)=S-V1(I),I=1,N) 3.4 5.3 3.6 4.1 CALL VSUB(V1,INCR1,V2,INCR2,V3,INCR3,N) (V3(I)=V1(I)-V2(I),I=1,N) 3.3 3.8 5.5 5.6 CALL VSUM(S,V1,INCR1,N) (V3(I)=V1(I)+V2(I),I=1,N) 3.5 4.3 3.5 4.1 CALL VSWP(V1,INCR1,V2,INCR2,N) (V1(I)<->V2(I),I=1,N) 3.2 5.0 5.0 5.7 CALL VMIX(INDEX,V1,V2,N) (V2(I)=V1(INDEX(I)),I=1,N) 1.8 2.7 NA NA CALL VMXI(INDEX,V1,V2,N) (V2(INDEX(I))=V1(I),I=1,N) 1.8 1.7 NA NA CALL CLAMP(VMIN,VMAX,V,N) (V1(I)=AMAX1(VMIN,AMIN1(VMAX,V(I))),I=1,N) 8.0 9.0 NA NA H=HORNER(C,X,N) H=SUM(C(I)*X**(I-1),I=1,N) 3.5 4.3 NA NA ---------------------------------------------------------------------------------------------------------------------- The above table shows, for instance, that an emulated add of two vector having length 100 is 7.5 times as fast as the same operation in FORTRAN on a "stock" PC with an 8087 math coprocessor. note 1: there is little or no improvement for n<10 and runtimes may increase for n<5. note 2: for double precision add a "D" prefix (e.g. DVABS, DVADD, ..., DCLAMP, DHORNE). note 3: vectors must not cross a segment boundary (see section 8 of Microsoft FORTRAN user's guide). note 4: all integers (e.g. INCR1,INCR2,INCR3,n...) are of the INTEGER*2 type. note 5: increments (viz. INCR1,INCR2,INCR3) can be positive, negative, or zero. .ft e .in 10